Quality report of the IPBES IAS bibliography

Introduction

This report assesses the following in regards to the provided bibliography named bibliography_metrics.rds:

Remarks: - Group ID is in the json, but not in the csv. The group ID makes it possible to directly jump to the reference in the Zotero Library online.

Bibliography Setup

The bibliography is loaded and the DOIs, ISBNs and ISSNs are extracted. In a second step, the corresponding works are downloaded from [OpenAlex(https://openalex.org)].

Show the code
ilk_bibliography <- tar_read(ias_bibliography)
Show the code
bibliography <- tar_read(ias_bibliography)

Data Quality of the Bibliography

Cleanliness of bibliography

One measure of the cleanliness of a Bibliography is assessed by checking the number of references that have a DOI. The following table gives an overview over some numbers regarding the DOIs, ISBNs and ISSNs in the bibliography.

Entries with DOIs, ISBNs or ISSNs

To identify a reference, the most widely used identifier is the DOI. The following table shows the number of references with a DOI and the number of unique DOIs.

To consider duplicate ISBNs or ISSNs as duplicates entries in the library is not waranted as e.g. differenc chapters of a book can be separate entries in the library and therefore lead toi duplicates.

  • DOIs: 3863 (73.8482126%) - 2 duplicates
  • ISBNs: 453 (8.6599121%) - 57 duplicates
  • ISSNs: 3301 (63.1045689%) - 2118 duplicates

The following DOIs are duplicates in the bibliography. This table should be empty.

Show the code
# duplicate_isbns <- paste0("https://isbnsearch.org/search?s=", bibliography$dois_bib[duplicated(bibliography$isbns)])
# duplicate_issns <- paste0("", bibliography$dois_bib[duplicated(bibliography$issns)])


data.frame(
    Type = "doi",
    Identifier = sprintf('<a href="https://doi.org/%s" target="_blank">%s</a>', bibliography$dois_bib[duplicated(bibliography$dois_bib)], bibliography$dois_bib[duplicated(bibliography$dois_bib)])
) |>
    knitr::kable(
        caption = "Duplicate DOIs in the Bibliography",
        escape = FALSE
    )
Duplicate DOIs in the Bibliography
Type Identifier
doi 10.1111/emr.12464
doi 10.1111/mam.12277

DOIs in Open Alex

To validate the existence and validity of the DOIs, we check if the DOIs are in the OpenAlex database.

Show the code
dois_not_in_oa <- unique(bibliography$dois_bib)[!(unique(bibliography$dois_bib) %in% bibliography$dois_works)]

dois_valid <- dois_not_in_oa[IPBES.R::doi_valid(dois_not_in_oa)]

Of the 3861 in the library which have a DOI, 471 (12.1989122%) are in not in OpenAlex.

Show the code
data.frame(
    Type = "doi",
    Identifier = sprintf('<a href="https://doi.org/%s" target="_blank">%s</a>', dois_not_in_oa, dois_not_in_oa)
) |>
    IPBES.R::table_dt(caption = "The Caption")

Of these 66 are not valid. These are:

Show the code
data.frame(
    Type = "doi",
    Identifier = sprintf('<a href="https://doi.org/%s" target="_blank">%s</a>', dois_not_in_oa[!(dois_not_in_oa %in% dois_valid)], dois_not_in_oa[!(dois_not_in_oa %in% dois_valid)])
) |>
    knitr::kable(
        caption = "Non Valid DOIs in the Bibliography",
        escape = FALSE
    )
Non Valid DOIs in the Bibliography
Type Identifier
doi 10.1641/0006-3568(2004)054[0613:TGISIN]2.0.CO;2
doi 10.1614/0043-1745(2003)051[0247:EPIATO]2.0.CO;2
doi 10.4102/koedoe. v53i2.1049
doi 10.1890/1051-0761(2002)012[0618:ATATOU]2.0.CO;2
doi 10.1645/0022-3395(2001)087[1366:TSNACI]2.0.CO;2
doi 10.2981/0909-6396(2007)13[159:PCOCMC]2.0.CO;2
doi 10.1641/0006-3568(2002)052[0593:BMTACA]2.0.CO;2
doi 10.1641/0006-3568(2001)051[0095:HAGPDG]2.0.CO;2
doi 10.1890/1540-9295(2003)001[0307:BBAB]2.0.CO;2
doi 10.1890/1540-9295(2004)002[0265:TOOATI]2.0.CO;2
doi 10.1890/1540-9295(2005)003[0495:TWOTRD]2.0.CO;2
doi https://conbio.onlinelibrary.wiley.com/doi/epdf/10.1111/j.1755-263X.2011.00164.x
doi 10.1890/0012-9658(2006)87[3109:CDGIAL]2.0.CO;2
doi doi.org/10.2307/1311536
doi 10.1002/(SICI)1096-9063(199703)49:3%3C213::AID-PS516%3E3.0.CO;2-%23
doi 10.1890/1540-9295(2003)001[0197:SODECA]2.0.CO;2
doi 10.1890/1540-9295(2007)5[365:ANGOCE]2.0.CO;2
doi doi.org/10.1038/26886
doi 10.1659/0276-4741(2006)026[0080:MANRNC]2.0.CO;2
doi 10.1641/0006-3568(2004)054[0337:DDPBWF]2.0.CO;2
doi 10.1890/0012-9658(1999)080[2045:SLRACF]2.0.CO;2
doi 10.1029/2011GL046583@10.1002/(ISSN)1944-8007.GRL40
doi 10.2111/1551-5028(2007)60[225:KAITPO]2.0.CO;2
doi 10.1890/1540-9295(2005)003[0071:BCOIAP]2.0.CO;2
doi 10.1641/0006-3568(2005)055[0518:DRFIIL]2.0.CO;2
doi 10.1890/1051-0761(2000)010[0689:BICEGC]2.0.CO;2
doi 10.1890/1540-9295(2005)003[0012:ISPMAF]2.0.CO;2
doi 10.1890/1540-9295(2006)4[428:LFAMIT]2.0.CO;2
doi 10.1641/0006-3568(2006)56[931:SAEITM]2.0.CO;2
doi 10.1653/0015-4040(2007)90[723:AOOTRI]2.0.CO;2
doi 10.1641/0006-3568(2000)050[0053:EAECON]2.3.CO;2
doi 10.1579/0044-7447(2008)37[114:ECITBS]2.0.CO;2
doi 10.1641/0006-3568(2004)054[0321:TGDONM]2.0.CO;2
doi 10.1641/0006-3568(2001)051[0103:HAAPOI]2.0.CO;2
doi 10.1577/1548-8667(2000)012<0001:IVIAIP>2.0.CO;2
doi 10.1890/1051-0761(2002)012[0927:AAMIIP]2.0.CO;2
doi 10.1890/1051-0761(2006)016[2035:BIRFUP]2.0.CO;2
doi 10.1002/(SICI)1099-145X(199905/06)10:3<225::AID-LDR337>3.0.CO;2-T
doi 10.1641/0006-3568(2003)053[0843:AMEASP]2.0.CO;2
doi 10.1890/1540-9295(2007)5[199:IASIAE]2.0.CO;2
doi 10.1641/0006-3568(2002)052[0801:EEIINA]2.0.CO;2
doi 10.1641/0006-3568(2005)055[0780:TOOTNP]2.0.CO;2
doi 10.2984/1534-6188(2007)61[469:rroaao]2.0.co;2
doi 10.1890/1540-9295(2004)002[0131:BBWAAO]2.0.CO;2
doi 10.1093/aobpla/plv085
doi annurev.es.23.110192.000431
doi 10.1641/0006-3568(2003)053[0703:BAFSIT]2.0.CO;2
doi 10.1002/1522-2632(200011)85:5/6<609::AID-IROH609>3.0.CO;2-S
doi 10.1890/1051-0761(2001)011[1750:ESITIS]2.0.CO;2
doi 10.2983/0730-8000(2007)26[281:BAAESA]2.0.CO;2
doi 10.1659/0276-4741(2002)022[0159:IOPPOS]2.0.CO;2
doi 10.1371/ journal.pone.0111913
doi doi.org/10.1023/A:1010090414598
doi https://www.sciencedirect.com/science/article/abs/pii/S0169534709002018
doi 10.1890/1540-9295(2006)004[0075:EACUDO]2.0.CO;2
doi 10.1111/j.1744-7429.201 0.00647.x
doi https://link.springer.com/article/10.1007/s10530-012-0347-1
doi 10.1641/0006-3568(2002)052[0464:Tgohnc]2.0.Co;2
doi 10.1641/0006-3568(2005)055[0335:SPFRTT]2.0.CO;2
doi 10.1641/0006-3568(2004)054[0677:EOIAPO]2.0.CO;2
doi 10.1890/1540-9295(2007)5[153:IPIIAR]2.0.CO;2
doi 10.1890/1540-9295(2004)002[0354:APFASI]2.0.CO;2
doi 10.1614/0043-1745(2000)048[0255:IWIRSI]2.0.CO;2
doi 10.1641/0006-3568(2000)050[0239:TAGISF]2.3.CO;2
doi 10.1890/1540-9295(2005)003[0103:TTOTCR]2.0.CO;2
doi 10.2984/1534-6188(2007)61[307:BAIOPI]2.0.CO;2

TODO Finally we check, if these dois exist but are not ingested into OpanAlex. This is done using the doi.org resolver This is disabled at the moment.

Show the code
dois_exist <- IPBES.R::doi_exists(
    dois_valid,
    cache_file = file.path(".", "cache", "doi_exist.rds")
)
Show the code
to_check <- bibliography$dois_bib[!(bibliography$dois_bib %in% dois_works)]

dois_valid <- IPBES.R::doi_valid(bibliography$dois_bib)
dois_openalex <- bibliography$dois_bib %in% dois_works
names(dois_openalex) <- bibliography$dois_bib

dois_exist <- IPBES.R::doi_exists(to_check, cache_file = file.path(".", "cache", "doi_exist.rds"))
dois_not_retracted <- IPBES.R::doi_not_retracted(bibliography$dois_bib, cache_file = file.path(".", "cache", "doi_not_retracted.rds"))

sprintf(
    fmt = paste(
        "Number of references: \t\t %d",
        "Number of DOIs: \t\t %d",
        "Number of Duplicate DOIs: \t %d",
        "Number of DOIs in OpenAlex: \t %d ( %f %)",
        "Number of Existing DOIs: \t %d",
        "Number of Retracted DOIs: \t %d",
        "Percentage of Duplicate DOIs: \t %f",
        sep = "\n"
    ),
    nrow(bibliography),
    sum(!is.na(bibliography$dois_bib)),
    length(bibliography$dois_bib) - length(unique((bibliography$dois_bib))),
    sum(dois_openalex), 100 * sum(dois_openalex) / nrow(bibliography),
    sum(dois_exist),
    sum(!dois_not_retracted),
    ((dois_valid |> unique() |> length()) / length(dois_valid)) |> round(digits = 3) * 100
) |> cat()
Show the code
oldopts <- options(knitr.kable.NA = "")
data.frame(
    Measure = c(
        "# References",
        "**DOI**",
        "# DOIs",
        "# Duplicate DOIs",
        "# Existing DOIs",
        "# Retracted DOIs",
        "% Duplicate DOIs",
        "**ISBN**",
        "# ISBNs",
        "# Duplicate ISBNs",
        "**ISSN**",
        "# ISSNs",
        "# Duplicate ISSNs"
    ),
    Value = c(
        nrow(bibliography),
        NA,
        sum(!is.na(bibliography$dois_bib)),
        length(bibliography$dois_bib) - length(unique((bibliography$dois_bib))),
        sum(dois_exist),
        sum(!dois_not_retracted),
        ((dois_valid |> unique() |> length()) / length(dois_valid)) |> round(digits = 3) * 100,
        NA,
        sum(!is.na(bibliography$isbns)),
        length(bibliography$isbns) - length(unique((bibliography$isbns))),
        NA,
        sum(!is.na(bibliography$issns)),
        length(bibliography$issns) - length(unique((bibliography$issns)))
    )
) |>
    knitr::kable(
        caption = "Cleanliness of the Bibliography",
    )
options(oldopts)

Contentual and Bibliographic analysis

Publication types

Show the code
bibliography$bibliography |>
    dplyr::group_by(
        Item.Type
    ) |>
    dplyr::summarize(
        count = n()
    ) |>
    dplyr::arrange(
        desc(count)
    ) |>
    knitr::kable()
Item.Type count
journalArticle 4157
bookSection 305
book 244
report 236
webpage 97
conferencePaper 83
document 67
thesis 15
dataset 12
blogPost 9
preprint 2
magazineArticle 1
newspaperArticle 1
presentation 1
videoRecording 1

Year of Publication

Show the code
tar_read(plot_pub_year_ias)
Warning: Removed 40 rows containing missing values or values outside the scale range
(`geom_line()`).
Warning: Removed 196 rows containing missing values or values outside the scale range
(`geom_line()`).

Access Status of References

This is checked by using the OpenAlex retrieved works. Therefore it is li=mited to the works that are on OpenAlex. At the moment, only references with a DOI were retrieved from OpenAlex.

Show the code
tar_read(plot_oa_status_ias)

50 Most often cited Journals

Show the code
tar_read(plot_top_journals_ias)

This table contains all Journals as specified in the Zotero database.

Show the code
tar_read(plot_top_journals_data_ias) |>
    IPBES.R::table_dt("cited_journals")

TODO Coutries of Institutes of all authors

This plot only contains the countries with more than 10 references.

Show the code
tar_read(plot_top_countries_ias)

This table contains all countries and the number of authorship.

Show the code
tar_read(plot_top_country_data_ias) |>
    IPBES.R::table_dt("top_countries")

VOSViewer Example

ILK References in Bibliography

This comparison is at the moment done using the DOIs in the target and the ILK bibliography. Entries which have no DOI can not be compared at the mopment. A comparison could be achieved by using text comparison of the title, but this is not implemenmted yet.

From the 3863 references in the target bibliography, 3863 (1%) are in the ILK bibliography (3863 references).